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Abstract — 'Tree pruning' (TP) is an algorithm for prob- 
abilistic inference on binary Markov random fields. It has 
been recently derived by Dror Weitz and used to construct 
the first fully polynomial approximation scheme for counting 
independent sets up to the 'tree uniqueness threshold.' It can be 
regarded as a clever method for pruning the belief propagation 
computation tree, in such a way to exactly account for the effect 
of loops. 

In this paper we generalize the original algorithm to make it 
suitable for decoding linear codes, and discuss various schemes 
for pruning the computation tree. Further, we present the out- 
comes of numerical simulations on several linear codes, showing 
that tree pruning allows to interpolate continuously between 
belief propagation and maximum a posteriori decoding. Finally, 
we discuss theoretical implications of the new method. 

I. Introduction 

Statistical inference is the task of computing marginals (or 
expectation values) of complex multi-variate distributions. 
Belief propagation (BP) is a generic method for accomplish- 
ing this task quickly but approximately, when the multivari- 
ate distribution factorizes according to a sparse graphical 
structure. The advent of sparse graph codes and iterative 
BP decoding [1] has naturally made decoding become an 
important case of this general problem. The present paper 
builds on this connection by 'importing' an algorithm that 
has been recently developed in the context of approximate 
counting and inference [2]. 

We will refer to the new algorithm as tree pruning (TP) 
decoding. For a number of reasons the application of this 
method to decoding is non-trivial. However, it is an interest- 
ing approach for the three following reasons, (i) It provides a 
sequence of decoding schemes that interpolates continuously 
between BP and the optimal maximum a posteriori (MAP) 
decoding, (ii) At each level of this sequence, the effect 
of loops of increasing length is taken into account. (Hi) 
We expect that an appropriate truncation of this sequence 
might yield a polynomial algorithm for MAP decoding on 
general graphs of bounded degree, for low enough noise 
levels. Preliminary numerical results are encouraging. 

A. Qualitative Features and Relation to BP Decoding 

As for BP decoding, TP decoding aims at estimating 
the a posteriori marginal probabilities of the codeword bits. 
Unhappily, the relation between BP estimates and the ac- 
tual marginals is in general poorly understood. In the case 
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of random Low-Density Parity-Check (LDPC) codes and 
communication over memoryless channels, density evolution 
allows to show that, at small enough noise level, the BP bit 
error probability becomes arbitrarily small if the blocklength 
is large enough. This implies that the distance between 
BP estimates and the actual marginals vanishes as well. 
This result is not completely satisfactory, in that it relies 
in a crucial way on the locally tree-like structure of sparse 
random graphs. This property does not hold for structured 
graphs, and, even for large graphs, it kicks in only at very 
large blocklengths. 

In contrast to this, the algorithm considered in this paper 
accounts systematically for short loops. It should therefore 
produce better performances, in particular in the error floor 
regime since this is dominated by small error events [3]. 

A convenient way of understanding the difference between 
BP and MAP decoding makes use of the so-called com- 
putation tree. Consider a code described by a factor graph 
G = (y, F, E) whereby V represents the variable nodes, 
F the factor nodes, and E the edges. Let i ^ V, then the 
corresponding computation tree denoted by T{i) is the tree 
of non-reversing walks in G that start at i. This gives a graph 
(tree) structure in a natural way: two nodes are neighbors if 
one is reached from the other adding a step. 

BP uses the marginal at the root of T(i) as an estimate 
for the marginal distribution at i on the original graph G. 
If G contains short loops in the neighborhood of i, the 
computation tree differs from G in a neighborhood of the 
root and, as a consequence, the BP estimate can differ vastly 
from the actual marginal. 

Weitz [2] made the surprising remark that there exists a 
simple way of pruning the computation tree (and fixing some 
of its variables) in such a way that the resulting root marginal 
coincides with the marginal on G. Unhappily the size of the 
pruned tree, which we call the self-avoiding walk tree and 
denote by SAW(i), is exponential in the size of the original 
graph G. Nevertheless, the tree can be truncated thus yielding 
a convergent sequence of approximations for the marginal 
at i. The complexity of the resulting algorithm is linear in 
the size of the truncated tree. Its efficiency depends on how 
sensitive is the root marginal to the truncation depth. 

B. Contributions and Outline 

Applying this approach to decoding linear codes poses 
several challenges: 
(i) Weitz's construction is valid only for Markov random 
fields (MRFs) with pairwise interactions and binary 
variables. The decoding problem does not fit this 
framework. 



(a) The original justification for truncating the self- 
avoiding walk tree followed the so-called 'strong spa- 
tial mixing' or 'uniqueness' condition. This amounts 
to saying that the conditional marginal at the root 
given the variables at depth t, depends weakly on the 
values of the latter This is (most of the times) false 
in decoding. For a 'good' code, the value of bit i in 
a codeword is completely determined by the values of 
bits outside a finite neighborhood around i. 

(Hi) Even worse, we found in numerical simulations that 
the original truncation procedure performs poorly in 
decoding. 

The self-avoiding walk tree construction has already moti- 
vated several applications and generalizations in the past few 
months. Jung and Shah [5] discussed its relation with BP, and 
proposed a distributed implementation. Mossel and Sly [6] 
used it to estimate mixing times of Monte Carlo Markov 
chain algorithms. Finally, and most relevant to the problems 
listed above, Nair and Tetali [4] proposed a generalization to 
non-binary variables and multi-variable interactions. While 
this generalizations does in principle apply to decoding, its 
complexity grows polynomially in the tree size. This makes 
it somewhat unpractical in the present context. 

In this paper we report progress on the three points above. 
Specifically, in SectionlUwe use duality to rephrase decoding 
in terms of a generalized binary Markov random field. We 
then show how to generalize the self-avoiding walk tree 
construction to this context. In Section |lll] we discuss the 
problems arising from the original truncation procedure, 
and describe two procedure that show better performances. 
Numerical simulations are presented in Section |IV] Finally, 
one of the most interesting perspectives is to use TP as a 
tool for analyzing BP and, in particular, comparing it with 
MAP decoding. Some preliminary results in this direction 
are discussed in Section IV] 

We should stress that a good part of our simulations 
concerns the binary erasure channel (BEC). From a practical 
point of view, TP decoding is not an appealing algorithm 
in this case. In fact, MAP decoding can be implemented in 
polynomial time through, for instance, Gaussian elimination. 
The erasure channel is nevertheless a good starting point 
for several reasons, (i) Comparison with MAP decoding is 
accessible, (ii) We can find a particularly simple truncation 
scheme in the erasure case, {iii) Some subtle numerical 
issues that exist for general channels disappear for the BEC. 

II. Decoding through the 
Self- Avoiding Walk Tree 

Throughout this paper we consider binary linear codes of 
blocklength n used over a binary-input memoryless chan- 
nel. Let BM(e), where e is a noise parameter, denote a 
generic channel. Assume that y is the output alphabet and 
let {Q{y\x) : x G {0,1}, y G y} denote its transition 
probability. 

With a slight abuse of terminology we shall identify a code 
with a particular parity-check matrix H that represents it, 

{xe {0,1}" :Hx = mod 2}. 



Therefore, the code is further identified with a Tanner 
graph G = {V, F, E) whose adjacency matrix is the parity- 
check matrix H. We will denote by da={i £ y : (i, a) G E} 
the neighborhood of function (check) node a, and write 
da = (ii(a), . . . , i;j(Q)(a)). Analogously, di={i G V : 
{i, a) G £'} indicates the neighborhood of the variable node 
i. The conditional distribution for the channel output y given 
the input x factorizes according to the graph G (also called 
factor graph). It follows immediately from Bayes rule that 
V{X_ = x\Y = y}^ where 

= ^ n Qiy^l^^) n -.,(a)=o mod 2). 

(1) 

We denote by fif{xi) — P{Xi — Xi\Y_ — y} the marginal 
distribution at bit i. Symbol MAP decoding amounts to the 
following prescription, 

*f^^(y) = arg max /if (a;*) • 
- xie{o.i} 

Both BP and TP decoders have the same structure, whereby 
the marginal yuf ( • ) is replaced by its approximation, respec- 
tively ff ^( • ) or vj^i- ). 

A. Duality and Generalized Markov Random Field 

We call a generalized Markov Random Field (gMRF) over 
the finite alphabet X a couple {Q,il)), where Q — {y,£) is 
an ordinary graph over vertex set V, and edge set £. Further 
■0 = {-^jj : G ^/^i : i G V} is a set of weights 

indexed by edges and vertices in Q, ipij : X x X ^ R, 
ijji : X ^ M. Notice that, unlike for ordinary MRFs, the 
edge weights in generalized MRFs are not required to be 
non-negative. 

Given a subset A CV, the marginal of the gMRF {Q, uj) 
on A, is defined as the function uja '■ X^ M, with entries 

WA^a) = X! n '^lkixi,Xk)Y\_'^li^l) ■ (2) 

{xj-.j^A} (i,k)e£ lev 

When A — V, we shall omit the subscript and call uj{x) the 
weight of configuration x. More generally, the expectation 
of a function / : X"^ R can be defined as 

^(/) - n ^ikixi^^k)i[Mxi)- (3) 

X {Lk)e£ lev 

Notice that uj{-) is not (and in general cannot be) normal- 
ized. In the sequel, whenever the relevant MRF has non- 
negative weights and is normalizable, we shall use words 
'expectation' and 'marginal' in the usual (normalized) sense. 

Duality can be used to reformulate decoding (in particular, 
the posterior marginals iif{xi)) in terms of a gMRF. More 
precisely, given a code with Tanner graph G = {V,F,E), 
we define a gMRF on graph Q ~ {V, £) where V = {V, F) 
and £ ^ E, proceeding as follows. We let X — {0, 1} and 
associate variables Xi G {0, 1} to i G and Ua G {0, 1} to 
a € F. We then introduce the weights 

Vl G V, MX^) = Q{y^\x^), Vo G F, M^a) = 1, (4) 

^{i,a)eE, 0„K,x,) = (-1)"''"'. (5) 



Fig. 1. Tanner graph for a repetition code of length 3. 

Although next statement follows from general duality theory, 
it is convenient to spell it out explicitly. 

Lemma 1. The marginals of the a posteriori distribution 
defined in Eq. (Q are proportional to the ones of the gMRF 
defined in Eq. and Eq. Q. More precisely, we get 

PiiXi) = UJ,{Xi)/[uJi{0) + UJi{l)]. 

Proof. It is immediate to prove a stronger result, namely that 
the distribution ii^{x) is proportional to ^(SjB)' where 
i^{x,n) is defined using Eq. (|4]i and Eq. (|5]r We have 

J2^(si,n)=J2YlQ{y,\x,) [] (-If-" 

n n ieV {i,a)eE 

ieV aeF n^e{0,l} 

= Y[Q{yt\Xr)Y{2l{^^^g^x,=0 mod 2), 

ieV a£F 

which is proportional to the right-hand side of Eq. (HJ. □ 

This result, which derives from [9] and [8], motivates us 
to extend Weitz's construction to gMRFsQ This is the object 
of the next section. 

B. The Self-Avoiding Walk Tree for Generalized Markov 
Random Field 

Assume we are given a graph Q = {V,£) and a node 
i £ V. We have already described the computation tree rooted 
at i, which we denote by T(i). 

An 'extended self-avoiding walk' (SAW) on a graph Q — 
(V, £), starting at « G V is a non-reversing walk that never 
visits twice the same vertex, except, possibly, for its end- 
point. The 'self-avoiding walk tree' rooted at i G V is the 
tree of all extended self-avoiding walks on Q starting at i. 
It is straightforward to see that SAW(i) is in fact a finite 
sub- tree of T{i). Its size is bounded by (A — l)'^', where 
A is the maximum degree of Q, and |V| the node number. 

As an example. Figure |2] shows a SAW tree for the small 
graph Q depicted in Figure [T] (In this case, Q is the Tanner 
graph of a repetition code of length 3.) If we denote by V{i) 
the vertex set of SAW(j), there exists a natural projection 
TT : V{i) — > V that preserves edges. Formally, tt maps a 
self-avoiding walk to its end-point. 

Notice that SAW(i) has two types of leaf nodes: (z) Nodes 
that are leaves in the original graph Q. (ii) Nodes that are 

'More explicitely, we can think of implementing a binary Fourier 
involution as p ropo sed first in [9] and [8] on tlie graph edges (as later 
reported in Eq. )13H . while processing all graph vertices in a similar way. 
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Fig. 2. Self-avoiding walk tree SAW(j) for the Tanner graph of 
Figure [T] rooted at variable node i = 0. In this picture, each node 
of the self-avoiding tree is labeled by its projection onto V. At 
'terminated' nodes we marked the value that the variable is forced 
to take. 

not leaves in Q but corresponds to extended self-avoiding 
walks that cannot be further continued. The latter case arises 
when the endpoint of the self-avoiding walk has already 
been visited (i.e., when a loop is closed). We shall refer 
to nodes of the second type as terminated nodes. Indeed, the 
self-avoiding walk tree SAW(?;) can be obtained from T{i) 
by the following termination procedure. Imagine descending 
T(i) along one of its branches. When the same projection 
is encountered for the second time, terminate the branch. 
Formally, this means eliminating all the descendants of u 
whenever tt{u) ~ tt{v) for some ancestor v of u. 

Given a gMRF (G, ip), we can define a gMRF on SAW(i) 
in the usual way. Namely, to any edge (u, u) G SAW(i), 
we associate a weight coinciding with the one of the cor- 
responding edge in Q: ipu,v{xu,Xy) = ip7r{u),-^{v}ixu,x^). 
The analogous definition is repeated for any non-terminated 
node: = V'ir(M) (^^n)- Finally, the choice of weight 

on terminated nodes makes use of the hypothesis that X = 
{0, 1}. Assume that the edges of Q are labeled using a given 
order, e.g., a lexicographic order. Let w be a terminated node 
with 7r(u) ~ j. Then the self-avoiding walk corresponding 
to u contains a loop that starts and ends at j. We let 
i^u{xu) = Kxu = 0) (respectively, '0„(a;„) = I{xu = 1)) if 
this loop quits j along an edge of higher (respectively, lower) 
order than the one along which it enters j. The relevance 
of this construction is due to Weitz who considerec0 the 
case of permissive binary MRFs. By this we mean that 
tpki{xk,xi) > 0, tpk{xk) > 0, and, for any fc G V, there 
exists xl G {0,1}, such that tpkixl) > 0,^pkiixl,xi) > 
for any I with {k,l) G £ and xi G {0,1}. (The latter is 
refeiTed to as the 'permissivity' condition.) 

-Weitz [2] considered the independent set problem but remai'ked that 
his construction generalized to a larger class of MRFs. Jung and Shah [5] 
studied this generalization. Nair and Tetali [4] discussed the case of 'hard- 
core' interactions (positively alignable) as well. 



Proposition 1 (Weitz). Given a permissive binary MRF 
(Q^ip), the marginal of Xi with respect to {GtiP) is propor- 
tional to the root marginal on SAW(i). 

The problem with non-permissive MRFs and, a fortiori, 
with generalized MRFs, is that the tree model SAW(i) may 
not admit any assignment of the variables such that all the 
weights iIji{xi), ij^kiixkjXi) are non-negative. As a conse- 
quence the MRF on SAW(j) does not define a probability 
distribution and this invalidates the derivation in [2] or [5]. 
Even worse, the procedure used in these papers was based 
in keeping track of ratios among marginals, of the form 
Ri — fJ.i{xi = 0)/fii{xi = 1). When the MRF does not 
define a distribution, ill-defined ratios such as 0/0 can appear. 

Let us stress that this problem is largely due to the 'termi- 
nation' procedure described above. This in fact constrains 
the set of assignments with non-vanishing weight to be 
compatible with the values assigned at terminated nodes. 

In order to apply the self-avoiding walk construction to 
gMRFs, we need to modify it in the two following ways. 

(z) We add further structure to SAW(i). For any u G V(i), 
let D(u) be the set of its children (i.e., the set of extended 
self-avoiding walks that are obtained by adding one step to 
u). Then we partition D(w) = Di(w)U- • ■UDfc(w) as follows. 
Let vi,V2 G D(u) be two children of u, and write them as 
vi — {u,ji), V2 = (u,j2). Further, let j = 7t{u). Then 
we write vi ~ V2 if there exists an extended self-avoiding 
walk of the form {u, ji,u' , j2, j). Here we are regarding 
u, u' as walks on Q (i.e., sequences of vertices) and we 
use (u, w, w, . . . ) to denote the concatenation of walks. It is 
not difficult to verify that ^ is an equivalence relation. The 
partition {Di(u), . . . , Dk{u)} is defined to be the partition 
in equivalence classes under this relation. 

(m) We define the generalized root marginal of SAW(i) 
through a recursive procedure that makes it always well- 
defined. First notice that, if is a tree rooted at i, then 
the marginal at i can be computed by a standard message 
passing (dynamic programming) procedure, starting from the 
leaves and moving up to the root. The update rules are, for 
u G Diw), 

) = ll'uiXu) Y\_ ^v^u{Xu), (6) 

veD{u) 

— >u{'^u) — ^ ^ '^UV i'^U 7 '^V ) — i-ui^-^v^ J (7) 

where edges are understood to be directed towards the root. 
The marginal at the root is obtained by evaluating the right 
hand side of Eq. ^ with u = i. 

The generalized root marginal is defined by the same pro- 
cedure but changing Eq. (|6]l as follows. Given the partition 
D(m) = Di(m) U • • • U Dk{u) described above, we let 

k 

(xu) = V'u(a;«) ]^SDi(«)(2:«) , (8) 
1=1 

where we define (^Di(u){xu) through a concatenation pro- 
cedure. Let (w*^^) (•),... ,tD('^^( • )) be the set of messages 



: V G D(u)} ordered according to the order of 
edges {tt{v),tt{u)) in Q. Then we let 

(i)D,(„)(0),SD,(„)(l))^(^D(i)(0),^DW(l)). (9) 

The reason for calling this a 'concatenation' follows from 
the remark that, with the notations above, we have cD^^^ (1) = 
S(^)(0), = tD(^)(0), etc. We refer to the discussion 

(and proof) below for a justification of this claim. As a conse- 
quence, the procedure in Eq. (|9]l can be described as follows: 
write the components of uj'^^\ ■ ) , uj'^^\ ■ ) ■■■ ,tl}^'^)(-) in 
sequence, and eliminate repeated entries. 

With this groundwork, we obtain the following general- 
ization of Weitz's result. 

Proposition 2. Given a gMRF {Q, if)), the marginal at i d 
V with respect to {Q, tp) is equal to the generalized root 
marginal on SAW(i). 

Proof. The proof is very similar to Weitz's original proof 
in [2]; the difference is that special care must be paid 
to avoid ill-defined expressions. The argument consists in 
progressively simplifying the graph Q (rooted at i) until 
SAW(i) is obtained. We shall represent these simplifications 
graphically. 

Consider the first step, corresponding to Eq. (HJ, with 
u — i. The partition of D(u) in {Di(m), ...Dfc(u)}, 
corresponds to a partition of of the subgraph Q\i (obtained 
by eliminating from Q, i as well as its adjacent vertices) 
into connected components. This correspondence is depicted 
below (whereby gray blobs correspond to connected sub- 
graphs). After factoring out the term iIju{xu), the definition of 




marginal in Eq. (|2]i factorizes naturally on such components, 
leading to Eq. ([8]) 

Consider now one of such components, call it Qi, such as 
the one depicted below. The corresponding generalized root 
marginal is computed using the concatenation rule, specified 
in Eq. (|9]l. In order to derive this rule, first consider the graph 




Q[ obtained from Gi by replacing its root u by fc = deg(M) 
copies u'-^\ . . . , u^'^\ each of degree 1 (here dcg(t;) denotes 
the degree of vertex v). Each of the newly introduced vertices 
is adjacent to one of the edges incident on the root in Qi. 
Further u''^\ . . . jU^*^^ are labeled according to the ordering 
(chosen at the beginning of the reduction procedure) on the 



adjacent edges. These k nodes will be referred to as 'split 
nodes' in the sequel. 

From the definition of marginal in Eq. (|2j, and using the 
notation ui' for the gMRF on Q[, we have 



UJuix) 



(10) 



for X G {0, 1}. This identity is represented as the first 
equality in the figure above. 

Next we replace the graph Q[ by k copies of it. 
Hi, ■ ■ ■ ,Hk- With a slight abuse of notation, we re-name 
M^^) the first of the k 'split nodes' in Hi, m^^-* the second 
in H2, and so on. Further we add node weights to the other 
'split nodes,' (i.e., the ones that remained un-named), either 
of the form tjjy{xv) = I{xv = 0) (forcing to take value 
0) or of the form ipy{xy) — I{xv — 1) (forcing Xy to take 
value 1). More precisely, for any j £ {1, k} on Hj we 
force to those split nodes that come before u^^\ and to 1 
the ones that come after. 

As a consequence, if we use uj (j) for the gMRF H'-^K we 
have 



,0) 



(11) 



In particular, for any j G {1, . 



we get 



1) (0). As a consequence of this fact and of Eq. ( fTOl i, 



K(0),c.„(l)) = (c^W (0),c.^^^l)) 



,(fc) 



(12) 



,(1) 

7,(1) 



\^ {x)'^uj^^\x) (second equality 



This proves Eq. (|9|l with uj"^ 
in the last figure above). 

Finally, Eq. (Q follows by considering the marginal of a 



node of degree 1, as w 



(1) 



in graphs Hi 



■ , Hi 



and expressing it in terms of the marginal of its only 
neighbor. 

This completes one full step of the procedure that breaks 
the loops through node i. By recursively repeating the 
same steps, the graph is completely unfolded giving rise to 
SAW(i). □ 

The self-avoiding walk tree SAW(i) appears as a conve- 
nient way to organize the calculation of the marginal at i 
in the general case. In the case of permissive MRFs this 
calculation coincides with a standard marginal calculation 
on the tree SAW(i). It is instructive to check this explicitly. 

Fact 1. Proposition\l\is a special case of Proposition^for 
permissive MRFs. 

Proof. First notice that, for permissive MRFs, the self- 
avoiding walk tree construction yields a MRF on SAW(i) that 
defines a probability distribution (non-negative and normaliz- 
able), whose marginals will be denoted as lo as well. We have 
to prove that, in this case, the generalized root marginal is 
proportional to the ordinary marginal at the root of SAW(i). 
The crucial remark is that, because of permissivity, the 
messages are non-negative and, in particular, ti>„,^t,(x*) > 
and u}u^v{xl) > 0. 



Assume, without loss of generality, that x* = 0. We 
define the likelihood ratios on the SAW(i) tree i?„^„ = 
a;„_^t,(l)/cj„^„(0), i?„^^= Lj„^,i,(l)/d)„^„(0) and Ri = 
uJi{l)/LLji{l). The ratio i?Di(u) is defined analogously in 
terms of u!d,{u){ • )• Equation (|7]i then implies 



i}uv{0,0) + 'l(juv{l,0) Ru^v 

Eq. dHJ yields on the other hand 



(13) 



D,(n) 



(14) 



1=1 



Finally, using the remark that = 
1, . . . ,k — 1, we get from Eq. (|9]l 

= W Ry^ 
Putting the last two equations together 



(0) for / 



n 



(15) 



(16) 



veD(u 



It is now easy to check that, Eq. (fTST i and Eq. (fT3T l coin- 
cide with the appropriate recursive definition of probability 
marginal ratios on SAW(i). □ 

Proposition |2] does not yield an efficient way of computing 
marginals of gMRF. The conundrum is that the resulting 
complexity is linear in the size of SAW(i) which is in turn 
exponential in the size of the original graph Q. On the 
other hand, it provides a systematic way to define and study 
algorithms for computing efficiently such a marginal. The 
idea, proposed first in [2], is to deform SAW(i) in such a 
way that its generalized root marginal does not change too 
much, but computing it is much easier 

III. Truncating the Tree 

BP can be seen as an example of the approach mentioned 
at the end of the previous section. In this case SAW(i) is 
replaced by the first t generations of the computation tree, 
to be denoted by T{i;t). In this case the complexity of 
evaluating the generalized root marginal scales as t rather 
than as \T{i; t)\. 

A different idea is to cut some of the branches of SAW(i) 
in such a way to reduce drastically its size. We will call 
truncation the procedure of cutting branches of SAW(«). It 
is important to keep in mind that truncation is different from 
the termination of branches when a loop is closed in Q. While 
termination is completely defined, we are free to define 
truncation to get as good an algorithm as we want. In the 
following we shall define truncation schemes parametrized 
by an integer t, and denoted as SAW(i;t). We will have 
SAW(i; t) — SAW(i) for t > n, thus recovering the exact 
marginal by Proposition |2] 

In order for the algorithm to be efficient, we need to ensure 
the following constraints, (i) SAW(i; t) is 'small enough' (as 
the complexity of computing its generalized root marginal 



is at most linear in its size), (m) SAW(i;t) is 'easy to 
construct.' For coding applications, this second constraint is 
somewhat less restrictive because the tree(s) SAW(«;<) can 
be constructed in a preprocessing stage and not recomputed 
at each use of the code. 

In order to achieve the second goal, we must define the 
partition D(u) = Di,t(u) U • • • U t(it) of children of u 
according to the subtree SAW(i;t) used in the computation. 
Consider two children of u, which we denote by vi,V2 G 
D(m). In a similar way as for the SAW(i) in the complete 
tree case, we write them as vi = {u,ji), V2 = {u,j2), 
and define vi V2 if there exists a descendant v[ of 
vi in SAW(i;t) such that Tr{v'i) = j2 or a descendant 
V2 of V2 such that tt{v2) — ji- The construction of the 
partition {Di . . . , D^.f (u)} will be different whether 
communication takes place over erasure or general channels. 

A. Weitz's Fixed Depth Scheme and its Problems 

The truncation procedure proposed in [2] amounts to 
truncating all branches of SAW(i) at the same depth t, unless 
they are already terminated at smaller depth. Variables at 
depth t (boundary) are forced to take arbitrary values. 

The rationale for this scheme comes from the 'strong 
spatial mixing' property that holds for the system studied 
in [2]. Namely, if we denote by ijJi\t{xi\xi) the normalized 
marginal distribution at the root i given variable assignment 
at depth t, we have 

lk|t(-kt)-^.it(-k;)llTv < AA*, (17) 

uniformly in the boundary conditions x^, for some con- 
stants A > 0, A e [0, 1). 

It is easy to realize that the condition in Eq. ( fTTI i generi- 
cally does not hold for 'good' sparse graph codes. The reason 
is that fixing the codeword values at the boundary of a large 
tree, normally determines their values inside the same tree. 
In other words the TP estimates strongly depend on this 
boundary condition. 

One can still hope that some simple boundary condition 
might yield empirically good estimates. An appealing choice 
is to leave 'free' the nodes at which the tree is truncated. This 
means that no node potential is added on these boundary ver- 
tices. We performed numerical simulations with this scheme 
on the same examples considered in the next section. The 
results are rather poor: unless the truncation level t is very 
large (which is feasible only for small codes in practice) the 
bit error rate is typically worse than under BP decoding. 

B. Improved Truncation Schemes: Erasure Channel 

For decoding over the BEC, a simple trick improves 
remarkably the performances of TP decoding. First, construct 
a subtree of SAW(i) of depth at most t by truncating at 
the deepest variable nodes whose depth does not exceed t. 
The partition {Di^t(w), . . . , D^.f (m)} is constructed using the 
equivalence class of the transitive closure of Then run 
ordinary BP on this graph, upwards from the leaves towards 
the root, and determine all messages in this direction. If a 



node u is decoded in this way, fix it to the corresponding 
value and further truncate the tree SAW(i) at this node. 

The resulting tree SAW(j; t) is not larger than the one 
resulting from fixed-depth truncation. For low erasure prob- 
abilities it is in fact much smaller than the latter. 

C. Improved Truncation Schemes: General channel 

The above trick cannot be applied to general BM channels. 
We therefore resort to the following two constructions. 

(i) Construction MAP(i;t): Define the distance d{i,j) 
between two variable nodes i, j to be the number of 
check nodes encountered along the shortest path from i 
to j. Let B(i; t) be the subgraph inducecH by variable 
nodes whose distance from i is at most t. Then we let 
SAW(i; t)=MAP(i; t) be the complete self-avoiding walk 
tree for the subgraph B{i;t). This corresponds to truncating 
SAW(i) as soon as the corresponding self-avoiding walk ex- 
its B{i;t). No forcing self-potential is added on the boundary. 

A nice property of this scheme is that it returns the a 
posteriori estimate of transmitted bit Xi given the channel 
outputs within B{i;t), call it llB(i t)- a consequence, 
many reasonable performance measures (bit error probability, 
conditional entropy, etc.) are monotone in t [11]. 

On the negative side, the size of the tree MAP(i; t) grows 
very rapidly (doubly exponentially) with t at small t. This 
prevented us from using t > 3. 

(a) Consti-uction MAP(z;i) - BP(£): The tree MAP(j;t) 
constructed as in the previous approach is augmented by 
adding some descendants to those nodes that are terminated 
in MAP(i;i). More precisely, below any such node u in 
the mentioned tree, we add the first £ generations of the 
computation tree. 

(iii) Construction SAW(z;i): We can implement a finer 
scheme for the general BM case. This scheme operates on 
SAW(i;t) obtained by truncating all branches of SAW(i) at 
the same depth t. The description of this method is slightly 
lengthy. We omit the details here and choose to present the 
numerical results in Fig. [T] of Section HVl 

IV. Numerical simulations 

For communication over the BEC, the implementation 
of TP decoding as described in Section IIII-BI is satisfying. 
While it is simple enough for practical purpose, it permits 
us to depict performance curves that interpolate successfully 
between BP and MAP decoding. The binary erasure channel, 
which we denote by BEC(e) if the erasure probability is e, is 
appealing for a first study for the following reasons, (z) The 
accessibility of performance curves under MAP decoding 
allows for a careful study of the new algorithm, (m) The 
TP decoder turns out to be 'robust' with respect to changes 
in the truncation method, hence simpler to study. 

As an example for a generic BM channel, we shall con- 
sider the binary-input additive white Gaussian noise channel 
with standard deviation cr, which we denote by BAWGN(cr^). 

^^The subgraph induced by a subset U of variable nodes is the one 
including all those check nodes that only involve vaiiables in U. 



Let us stress that the TP decoder is not (in general) sym- 
metric with respect to codewords. This complicates a little 
the analysis (and simulations) which has to be performed for 
uniformly random transmitted codewords. 

A. Binary Erasure Channel 

The erasure case is illustrated by three examples: a tail- 
biting convolutional code, the (23, 12) Golay code and a 
sparse graph code. Here the comparison is done with BP 
after convergence ('infinite' number of iterations) and MAP 
as implemented through Gaussian elimination. 

TP decoder permits us to plot a sequence of performance 
curves, indexed by the truncation parameter t. In all of the 
cases considered, TP improves over BP akeady at small 
values of t. As t increases, TP eventually comes very close 
to MAP. The gain is particularly clear in codes with many 
short loops, and at low noise. This confirms the expectation 
that, when truncated, TP effectively takes care of small 
'pseudocodewords.' 

The first example is a memory two and rate 1/2 convolu- 
tional code in tailbiting form with blocklength n = 100. The 
performance curves are shown in Fig. |3] The TP and BP de- 
coders are based on a periodic Tanner graph associated with 
the tailbiting code with generator pair (1 + D^, 1 + D + D^). 
More precisely, they are based on the graph representing the 
parity-check matrix with circulant horizontal pattern 110111. 
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Fig. 3. Tailbiting convolutional code with generator pair (1 + 
D^,l + D + L>^) and blocklength n = 100. Black curve: 
BP decoding with t = co. Red curve: MAP decoding (BP 
and Gaussian elimination). Blue curves: BP decoding with t — 
3, 4, 5, 6, 8, 10, 12, 14 (almost indistinguishable). Red curves: TP 
decoding with t = 3, 4, 5, 6, 8, 10, 12, 14 (truncated tree). 



The second example is the standard (perfect) Golay code 
with blocklength n = 23. It is shown in Fig. |4] 

The third example, an LDPC code with blocklength n — 
50, is depicted in Fig. |5] 

B. Binary-Input Additive White Gaussian Noise Channel 

In the case of the BAWGN channel, we consider a single 
example of code, the tail-biting convolutional code used 
above, and two truncation schemes, the constructions (m) 
and {in) described in Section HIl-CI 
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Fig. 4. (23, 12) Golay code with blocklength n — 23. Blue curve: 
BP decoding with t = cxd. Black curve: MAP decoding (BP and 
Gaussian elimination). Blue curves: BP decoding with t = 4,5, 6. 
Red curves: TP decoding with t = 4, 5, 6 (truncated tree). 
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Fig. 5. A regular (3, 6) LDPC code with blocklength n = 50. Blue 
curve: BP decoding with t — oo. Black curve: MAP decoding (BP 
and Gaussian elimination). Blue curves: BP decoding with t = 7,8. 
Red curves: TP decoding with t = 7,S (truncated tree). 



Our results are shown in Fig. |6] and Fig. |7] The TP and 
BP decoders are based on the natural periodic Tanner graph 
associated with the tailbiting code. We run BP a large number 
of iterations and check the error probability to be roughly 
independent of the iterations number. The MAP decoder is 
performed using BP on the single-cycle tailbiting trellis (i.e., 
BCJR on a ring [7], [9], [10]). 

We observe that the two schemes MAP{i;t) - BP(^) 
and SAW(i; t) with t = 8 outperform BP. Unhappily, due 
to complexity constraints we were limited to small values 
of t and therefore could not approach the actual MAP 
performances. 

V. Theoretical Implications 

One interesting direction is to use the self-avoiding walk 
tree construction for analysis purposes. We think in particular 
of two types of developments: (i) a better understanding of 
the relation between BP and MAP decoding, and (ii) a study 
of the 'inherent hardness' of decoding sparse graph codes. 

While the first point is self-explanatory, it might be useful 
to spend a few words on the second. The most important 




Fig. 6. Tailbiting convolutional code with generator pair (1 + 
D^, 1 + D + and blocklength n = 50. Dashed black curve: BP 
decoding with t — 400. Black curve: MAP decoding (wrap-around 
BCJR). Blue curves: BP decoding with t = 8, 50. Red curves: 
TP decoding with t = 8 (truncated tree, denoted by TP(i;8)), TP 
decoding on a ball of radius 2 (scheme {ii) with no BP processing, 
denoted by MAP(2)) and, TP decoding according to scheme (ii) 
(with parameters as indicated by MAP(i;2) - BP(50)). 




Fig. 7. Tailbiting convolutional code with generator pair (1 + 
D'^,1 + D + D'^) and blocklength n = 50. Blue curve: BP decoding 
with t — 400. Black curve: MAP decoding (wrap-around BCJR). 
Red curve: TP decoding according to scheme (Hi) (with parameter 
as indicated by SAW(j; t = 8) using a suitable truncated tree). 

outcome of the theory of iterative coding systems can be 

phrased as follows. 

There exist families of graphs (expanders [12], 
random [13]) with diverging size and bounded 
degree, below a certain noise level, MAP decoding 
can be achieved in linear time up to a 'small error' 

We think (a formal version of) the same statement to be true 

for any family of graphs with bounded degree. This can be 

proved for the erasure channel. 

Proposition 3. Let {G„} be a family of Tanner graphs of 
diverging blocklength n, with maximum variable degree 1 
and check degree r. Consider communication over BEC(e) 
with e < 1/(1 — l)(r — 1). Then, for any S > there 
exists a decoder whose complexity is of order nPoly(l/(5) 
and returning estimates {xi{y), X2{y), ■ ■ ■ , such that 

P{x,(y) ^ (y)} < ,5. 



Proof. The decoder consists in returning the MAP estimate 
of i given the subgraph B{i;t) and the values received 
therein. Consider the subgraph Gn{y) of G„ obtained by 
removing non-erased bits. The proof consists in an elemen- 
tary percolation estimate on this graph, see [14]. 

It is easy to see that F{xi {y) ^ x"^''{y)} is upper bounded 
by the probability that the connected component of Gn{y) 
that contains i is not-contained in B{i;t). This is in turn 
upper bounded by the number of paths between i and a vertex 
at distance t+1 (which is at most l(l — l)*(r — 1)*) times the 
probability that one such path is completely erased (which 
is e*+i). Therefore, for A = le > and A = (l - l)(r - 
l)e < 1, we get V{x^{y) ^ xf''''{y)} < AX* . The proof is 
completed by taking t = log{A/6)/ \og{l/\), and noticing 
that B{i;t) can be decoded in time polynomial in its size, 
that is polynomial in 1/5. The computation is repeated for 
each i £ {1, . . . ,n} whence the factor n. □ 

We think that a strengthening (better dependence on the 
precision S) and generalization (to other channel models) of 
this result can be obtained using the self-avoiding walk tree 
construction. 
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